Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications
نویسندگان
چکیده
What are the natural loss functions or fitting criteria for binary class probability estimation? This question has a simple answer: so-called “proper scoring rules”, that is, functions that score probability estimates in view of data in a Fisher-consistent manner. Proper scoring rules comprise most loss functions currently in use: log-loss, squared error loss, boosting loss, and as limiting cases cost-weighted misclassification losses. Proper scoring rules have a rich structure: • Every proper scoring rules is a mixture (limit of sums) of cost-weighted misclassification losses. The mixture is specified by a weight function (or measure) that describes which misclassification cost weights are most emphasized by the proper scoring rule. • Proper scoring rules permit Fisher scoring and Iteratively Reweighted LS algorithms for model fitting. The weights are derived from a link function and the above weight function. • Proper scoring rules are in a 1-1 correspondence with information measures for tree-based classification. • Proper scoring rules are also in a 1-1 correspondence with Bregman distances that can be used to derive general approximation bounds for cost-weighted misclassification errors, as well as generalized bias-variance decompositions. We illustrate the use of proper scoring rules with novel criteria for 1) Hand and Vinciotti’s (2003) localized logistic regression and 2) for interpretable classification trees. We will also discuss connections with exponential loss used in boosting.
منابع مشابه
Bayesian Estimation of Shift Point in Shape Parameter of Inverse Gaussian Distribution Under Different Loss Functions
In this paper, a Bayesian approach is proposed for shift point detection in an inverse Gaussian distribution. In this study, the mean parameter of inverse Gaussian distribution is assumed to be constant and shift points in shape parameter is considered. First the posterior distribution of shape parameter is obtained. Then the Bayes estimators are derived under a class of priors and using variou...
متن کاملکاهش ابعاد دادههای ابرطیفی به منظور افزایش جداییپذیری کلاسها و حفظ ساختار داده
Hyperspectral imaging with gathering hundreds spectral bands from the surface of the Earth allows us to separate materials with similar spectrum. Hyperspectral images can be used in many applications such as land chemical and physical parameter estimation, classification, target detection, unmixing, and so on. Among these applications, classification is especially interested. A hyperspectral im...
متن کاملHigh-dimensional pseudo-logistic regression and classification with applications to gene expression data
High dimension low sample size data, like the microarray gene expression levels, pose numerous challenges to conventional statistical methods. In the particular case of binary classification, some classification methods, such as the support vector machine (SVM), can efficiently deal with high-dimensional predictors, but lacks the accuracy in estimating the probability of membership of a class. ...
متن کاملشاخصهای جدید پایش تأثیر چرخۀ تأمین تجهیزات بر تلفات سیستم توزیع برق
Restructuring in electrical power distribution utilities, energy loss management is one of the critical activities to enhance network efficiency. Accordingly, monitoring of technical loss and strategic planning for loss reduction would be a main target of asset manager. For loss monitoring, the usual methods of loss calculation and estimation need numerous data which is not used mostly to utili...
متن کاملOn surrogate loss functions and $f$-divergences
The goal of binary classification is to estimate a discriminant function γ from observations of covariate vectors and corresponding binary labels. We consider an elaboration of this problem in which the covariates are not available directly, but are transformed by a dimensionality-reducing quantizer Q. We present conditions on loss functions such that empirical risk minimization yields Bayes co...
متن کامل